An Overview of Corpus-Based Statistics-Oriented(CBSO) Techniques for Natural Language Processing
نویسندگان
چکیده
A Corpus-Based Statistics-Oriented (CBSO) methodology, which is an attempt to avoid the drawbacks of traditional rule-based approaches and purely statistical approaches, is introduced in this paper. Rule-based approaches, with rules induced by human experts, had been the dominant paradigm in the natural language processing community. Such approaches, however, suffer from serious difficulties in knowledge acquisition in terms of cost and consistency. Therefore, it is very difficult for such systems to be scaled-up. Statistical methods, with the capability of automatically acquiring knowledge from corpora, are becoming more and more popular, in part, to amend the shortcomings of rule-based approaches. However, most simple statistical models, which adopt almost nothing from existing linguistic knowledge, often result in a large parameter space and, thus, require an unaffordably large training corpus for even well-justified linguistic phenomena. The corpus-based statistics-oriented (CBSO) approach is a compromise between the two extremes of the spectrum for knowledge acquisition. CBSO approach emphasizes use of well-justified linguistic knowledge in developing the underlying language model and application of statistical optimization techniques on top of high level constructs, such as annotated syntax trees, rather than on surface strings, so that only a training corpus of reasonable size is needed for training and long distance dependency between constituents could be handled. In this paper, corpus-based statistics-oriented techniques are reviewed. General techniques applicable to CBSO approaches are introduced. In particular, we shall address the following important issues so that general guidelines for developing a particular NLP system will be available to NLP researchers:
منابع مشابه
Corpus-Based Statistics-Oriented (CBSO) Machine Translation Researches in Taiwan
A brief introduction to the MT research projects in Taiwan is given in this paper. Special attention is given to the more and more popular corpus-based statistics-oriented (CBSO) approaches in MT researches. In particular, the parameterized two-way training philosophy in designing the second generation BehaviorTran, which is the first and the largest operational system in this area, is introduc...
متن کاملIntroduction to Corpus - Based Statistics - Oriented ( Cbso ) Techniques ( Part Ii : Basic Concepts )
متن کامل
Corpus based coreference resolution for Farsi text
"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...
متن کاملContext-Based Integrative Educational Technique in Profession-Oriented Foreign Language Teaching (Academic Model United Nations)
The aim of the article is to examine the Academic Model United Nations (Model UN) as a context-based integrative educational technique in profession-oriented foreign language teaching (FLT); to point out the context-based integrative nature of profession-oriented language learning and highlight the importance of using product-based educational techniques in FLT for developing students’ future p...
متن کاملMining Aspects in Requirements
The early identification and documentation of crosscutting concerns enables better change management and traceability of requirements. Moreover, this also improves the early identification of candidate aspects in the design and implementation stages. Current techniques for identifying aspects in requirements are ineffective when requirements are complex or unstructured. This paper describes an ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IJCLCLP
دوره 1 شماره
صفحات -
تاریخ انتشار 1996